During this notebook I will try to link the situation of the main food related placed in the city of toronto with the biggest business development area. This will lead us to find which is the % of restaurantes in each area and the proportion compared to the rest. Based on this if we want to place a restaurant it should be done in the best business area with the lowest restaurant rate
I provide an study where I evaluate the realtionship between the number of elements in each area compared with the number of food related ones. Lowest ratio is the indicator to place the restaurant
import pandas as pd
import numpy as np
import matplotlib.cm as cm
from scipy.spatial import distance_matrix
import matplotlib.colors as colors
import folium # plotting library
from sklearn.cluster import KMeans
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
from math import cos, asin, sqrt
%matplotlib inline
# Foursquare data:
df_square=pd.read_csv('Toronto_data.csv')
df_square.head()
df_areas=pd.read_csv('Business Improvement Areas Data.csv')
df_areas.head()
In the code below we will assign the closest business area based on distance to the center of the area. This way we can calculate the total number of places by AREA, which will give us a size of it.
def distance(lat1, lon1, lat2, lon2):
p = 0.017453292519943295
a = 0.5 - cos((lat2-lat1)*p)/2 + cos(lat1*p)*cos(lat2*p) * (1-cos((lon2-lon1)*p)) / 2
return 12742 * asin(sqrt(a))
def closest(data, v):
return min(data, key=lambda p: distance(v['LATITUDE'],v['LONGITUDE'],p['LATITUDE'],p['LONGITUDE']))['AREA_NAME']
def find_area():
tempData = []
for index, row in df_areas.iterrows():
tempDict = {}
tempDict['LATITUDE']=row['LATITUDE']
tempDict['LONGITUDE']=row['LONGITUDE']
tempDict['AREA_NAME']=row['AREA_NAME']
tempData.append(tempDict)
return_value=[]
for index, row in df_square.iterrows():
temp_results = {}
tempRow = {'LATITUDE': row['Venue Latitude'], 'LONGITUDE': row['Venue Longitude']}
temp_results['AREA']=closest(tempData,tempRow)
temp_results['Venue Category']=row['Venue Category']
temp_results['Venue Latitude']=row['Venue Latitude']
temp_results['Venue Longitude']=row['Venue Longitude']
return_value.append(temp_results)
return return_value
df_square['AREA']=""
df_temp_rest=find_area()
df = pd.DataFrame(df_temp_rest, columns =['AREA', 'Venue Category' ,'Venue Latitude','Venue Longitude' ])
data_grouped=df.groupby("AREA")["AREA"].count()
df_n = pd.DataFrame(data_grouped, columns=['AREA'])
df_n.rename(columns={'AREA':'Total'},inplace=True)
df_population=df_n.sort_values(by=['Total'],ascending=False)
df.loc[df['Venue Category'].str.contains("Restaurant"), 'food_related'] = True
df.loc[df['Venue Category'].str.contains("Gastropub"), 'food_related'] = True
#df_square.loc[df_square['Venue Category'].str.contains("Bar"), 'food_related'] = True
df_area_food=df[df['food_related'] == True]
len(df_area_food)
df_area_grouped=df_area_food.groupby("AREA")["AREA"].count()
df_area_grouped = pd.DataFrame(df_area_grouped, columns =['AREA'])
df_area_grouped.rename(columns={'AREA':'Total'},inplace=True)
df_restaurants=df_area_grouped.sort_values(by=['Total'],ascending=False)
Let's find out which is the best business area based on the propotion of restaurants
# Number of Items
df_population
# Number of Restaurants
df_restaurants
df_test=df_population.join(df_restaurants, lsuffix='_caller', rsuffix='_other')
df_test['Ratio']=(df_test['Total_other']*100)/df_test['Total_caller']
df_test=df_test.reset_index()
df_test.head()
address = 'Toronto'
geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)
for lat, lng, venue_type in zip(df['Venue Latitude'], df['Venue Longitude'], df['Venue Category']):
label = '{}'.format(venue_type)
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
[lat, lng],
radius=5,
popup=label,
color='blue',
fill=True,
fill_color='#3186cc',
fill_opacity=0.7,
parse_html=False).add_to(map_toronto)
for lat, lng, area_name in zip(df_areas['LATITUDE'], df_areas['LONGITUDE'], df_areas['AREA_NAME']):
label = '{}'.format(area_name)
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
[lat, lng],
radius=5,
popup=label,
color='red',
fill=True,
fill_color='#3186cc',
fill_opacity=0.7,
parse_html=False).add_to(map_toronto)
map_toronto
# Print the investment area with less ratio restaurant / rest
df_ratio=df_test.sort_values(by=['Ratio'],ascending=True)
df_ratio
In this section we can see that Village of Islington is the best area for a Restaurant
df_winner = df_areas[df_areas['AREA_NAME'] == 'Village of Islington' ]
df_winner
map_winner = folium.Map(location=[df_winner.iloc[0]['LATITUDE'], df_winner.iloc[0]['LONGITUDE']], zoom_start=20)
label = '{}'.format(df_winner.iloc[0]['AREA_NAME'])
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
[df_winner.iloc[0]['LATITUDE'], df_winner.iloc[0]['LONGITUDE']],
radius=5,
popup=label,
color='red',
fill=True,
fill_color='#3186cc',
fill_opacity=0.7,
parse_html=False).add_to(map_winner)
map_winner
As a summary of this exercise we can evaluate which are the areas where Toronto has been investing in business development. This situation generates a great ecosystem to generate business development in the area, in our analysis we have established the ratio between different business that are in the area. Based on both ideas, we can summarize that if an area is growing fast and the restaurant ratio is smaller than the rest we can ensure that this area will be a good investment point for a food place.